[SPARK-16926] [SQL] Remove partition columns from partition metadata.#14515
[SPARK-16926] [SQL] Remove partition columns from partition metadata.#14515bchocho wants to merge 1 commit intoapache:masterfrom
Conversation
|
This triggers the else case here: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L368. cc: @andrewor14 |
|
|
||
| // Note: In Hive the schema and partition columns must be disjoint sets | ||
| val schema = catalogTable.schema.map(toHiveColumn).filter { c => | ||
| !catalogTable.partitionColumnNames.contains(c.getName) |
There was a problem hiding this comment.
ah good catch! It would be better if we can have a test to prove the unnecessary conversion object inspector is removed.
There was a problem hiding this comment.
@cloud-fan I've instead created a unit test that simply checks if the number of columns in the table and partition metadata are the same for a newly created table. Since this PR has been merged already, I created a new one: #14930.
|
ok to test |
|
Test build #64771 has finished for PR 14515 at commit
|
|
LGTM, merging this into 2.0 and master, thanks! |
## What changes were proposed in this pull request? This removes partition columns from column metadata of partitions to match tables. A change introduced in SPARK-14388 removed partition columns from the column metadata of tables, but not for partitions. This causes TableReader to believe that the schema is different between table and partition, and create an unnecessary conversion object inspector in TableReader. ## How was this patch tested? Existing unit tests. Author: Brian Cho <bcho@fb.com> Closes #14515 from dafrista/partition-columns-metadata. (cherry picked from commit 473d786) Signed-off-by: Davies Liu <davies.liu@gmail.com>
…n metadata. ## What changes were proposed in this pull request? Add unit test for changes made in PR #14515. It makes sure that a newly created table has the same number of columns in table and partition metadata. This test fails before the changes introduced in #14515. ## How was this patch tested? Run new unit test. Author: Brian Cho <bcho@fb.com> Closes #14930 from dafrista/partition-metadata-unit-test.
What changes were proposed in this pull request?
This removes partition columns from column metadata of partitions to match tables.
A change introduced in SPARK-14388 removed partition columns from the column metadata of tables, but not for partitions. This causes TableReader to believe that the schema is different between table and partition, and create an unnecessary conversion object inspector in TableReader.
How was this patch tested?
Existing unit tests.